Search CORE

89 research outputs found

Recommended from our members

Robust prediction of clinical outcomes using cytometry data.

Author: Butte Atul J
Glicksberg Benjamin S
Hu Zicheng
Publication venue: eScholarship, University of California
Publication date: 01/04/2019
Field of study

MotivationFlow cytometry and mass cytometry are widely used to diagnose diseases and to predict clinical outcomes. When associating clinical features with cytometry data, traditional analysis methods require cell gating as an intermediate step, leading to information loss and susceptibility to batch effects. Here, we wish to explore an alternative approach that predicts clinical features from cytometry data without the cell-gating step. We also wish to test if such a gating-free approach increases the accuracy and robustness of the prediction.ResultsWe propose a novel strategy (CytoDx) to predict clinical outcomes using cytometry data without cell gating. Applying CytoDx on real-world datasets allow us to predict multiple types of clinical features. In particular, CytoDx is able to predict the response to influenza vaccine using highly heterogeneous datasets, demonstrating that it is not only accurate but also robust to batch effects and cytometry platforms.Availability and implementationCytoDx is available as an R package on Bioconductor (bioconductor.org/packages/CytoDx). Data and scripts for reproducing the results are available on bitbucket.org/zichenghu_ucsf/cytodx_study_code/downloads.Supplementary informationSupplementary data are available at Bioinformatics online

eScholarship - University of California

Processing of Electronic Health Records using Deep Learning: A review

Author: Danieletto Matteo
Dudley Joel
Glicksberg Benjamin
Li Li
Mayora Oscar
Osmani Venet
Publication venue
Publication date: 05/04/2018
Field of study

Availability of large amount of clinical data is opening up new research avenues in a number of fields. An exciting field in this respect is healthcare, where secondary use of healthcare data is beginning to revolutionize healthcare. Except for availability of Big Data, both medical data from healthcare institutions (such as EMR data) and data generated from health and wellbeing devices (such as personal trackers), a significant contribution to this trend is also being made by recent advances on machine learning, specifically deep learning algorithms

arXiv.org e-Print Archive

Archivio della ricerca - Fondazione Bruno Kessler

Recommended from our members

Accuracy of medical billing data against the electronic health record in the measurement of colorectal cancer screening rates.

Author: Avila Patrick
Butte Atul J
Glicksberg Benjamin S
Harding-Theobald Emily
Rudrapatna Vivek A
Wang Connie
Publication venue: eScholarship, University of California
Publication date: 01/03/2020
Field of study

ObjectiveMedical billing data are an attractive source of secondary analysis because of their ease of use and potential to answer population-health questions with statistical power. Although these datasets have known susceptibilities to biases, the degree to which they can distort the assessment of quality measures such as colorectal cancer screening rates are not widely appreciated, nor are their causes and possible solutions.MethodsUsing a billing code database derived from our institution's electronic health records, we estimated the colorectal cancer screening rate of average-risk patients aged 50-74 years seen in primary care or gastroenterology clinic in 2016-2017. 200 records (150 unscreened, 50 screened) were sampled to quantify the accuracy against manual review.ResultsOut of 4611 patients, an analysis of billing data suggested a 61% screening rate, an estimate that matches the estimate by the Centers for Disease Control. Manual review revealed a positive predictive value of 96% (86%-100%), negative predictive value of 21% (15%-29%) and a corrected screening rate of 85% (81%-90%). Most false negatives occurred due to examinations performed outside the scope of the database-both within and outside of our institution-but 21% of false negatives fell within the database's scope. False positives occurred due to incomplete examinations and inadequate bowel preparation. Reasons for screening failure include ordered but incomplete examinations (48%), lack of or incorrect documentation by primary care (29%) including incorrect screening intervals (13%) and patients declining screening (13%).ConclusionsBilling databases are prone to substantial bias that may go undetected even in the presence of confirmatory external estimates. Caution is recommended when performing population-level inference from these data. We propose several solutions to improve the use of these data for the assessment of healthcare quality

eScholarship - University of California

Knowledge-Augmented Contrastive Learning for Abnormality Classification and Localization in Chest X-rays with Radiomics using a Feedback Loop

Author: Chen Chongyan
Ding Ying
Glicksberg Benjamin
Han Yan
Peng Yifan
Tewfik Ahmed
Wang Zhangyang
Publication venue
Publication date: 17/10/2021
Field of study

Building a highly accurate predictive model for classification and localization of abnormalities in chest X-rays usually requires a large number of manually annotated labels and pixel regions (bounding boxes) of abnormalities. However, it is expensive to acquire such annotations, especially the bounding boxes. Recently, contrastive learning has shown strong promise in leveraging unlabeled natural images to produce highly generalizable and discriminative features. However, extending its power to the medical image domain is under-explored and highly non-trivial, since medical images are much less amendable to data augmentations. In contrast, their prior knowledge, as well as radiomic features, is often crucial. To bridge this gap, we propose an end-to-end semi-supervised knowledge-augmented contrastive learning framework, that simultaneously performs disease classification and localization tasks. The key knob of our framework is a unique positive sampling approach tailored for the medical images, by seamlessly integrating radiomic features as a knowledge augmentation. Specifically, we first apply an image encoder to classify the chest X-rays and to generate the image features. We next leverage Grad-CAM to highlight the crucial (abnormal) regions for chest X-rays (even when unannotated), from which we extract radiomic features. The radiomic features are then passed through another dedicated encoder to act as the positive sample for the image features generated from the same chest X-ray. In this way, our framework constitutes a feedback loop for image and radiomic modality features to mutually reinforce each other. Their contrasting yields knowledge-augmented representations that are both robust and interpretable. Extensive experiments on the NIH Chest X-ray dataset demonstrate that our approach outperforms existing baselines in both classification and localization tasks.Comment: Accepted by WACV 202

arXiv.org e-Print Archive

Recommended from our members

ROMOP: a light-weight R package for interfacing with OMOP-formatted electronic health record data.

Author: Butte Atul J
Datta Debajyoti
Frazier Remi
Giangreco Nicholas
Glicksberg Benjamin S
Larsen Rick
Lee Nelson
Oskotsky Boris
Rudrapatna Vivek
Tatonetti Nicholas P
Thangaraj Phyllis M
Publication venue: eScholarship, University of California
Publication date: 01/04/2019
Field of study

Objectives:Electronic health record (EHR) data are increasingly used for biomedical discoveries. The nature of the data, however, requires expertise in both data science and EHR structure. The Observational Medical Out-comes Partnership (OMOP) common data model (CDM) standardizes the language and structure of EHR data to promote interoperability of EHR data for research. While the OMOP CDM is valuable and more attuned to research purposes, it still requires extensive domain knowledge to utilize effectively, potentially limiting more widespread adoption of EHR data for research and quality improvement. Materials and methods:We have created ROMOP: an R package for direct interfacing with EHR data in the OMOP CDM format. Results:ROMOP streamlines typical EHR-related data processes. Its functions include exploration of data types, extraction and summarization of patient clinical and demographic data, and patient searches using any CDM vocabulary concept. Conclusion:ROMOP is freely available under the Massachusetts Institute of Technology (MIT) license and can be obtained from GitHub (http://github.com/BenGlicksberg/ROMOP). We detail instructions for setup and use in the Supplementary Materials. Additionally, we provide a public sandbox server containing synthesized clinical data for users to explore OMOP data and ROMOP (http://romop.ucsf.edu)

eScholarship - University of California

Recommended from our members

Protected Health Information filter (Philter): accurately and securely de-identifying free-text clinical notes.

Author: Butte Atul J
Fan Xuancheng
Glicksberg Benjamin S
Goldstein Theodore
Ludwig Dana
Muenzen Kathleen
Norgeot Beau
Oskotsky Boris
Peterson Thomas A
Rutenberg Eugenia
Schenk Gundolf
Schmajuk Gabriela
Sirota Marina
Yazdany Jinoos
Publication venue: eScholarship, University of California
Publication date: 01/01/2020
Field of study

There is a great and growing need to ascertain what exactly is the state of a patient, in terms of disease progression, actual care practices, pathology, adverse events, and much more, beyond the paucity of data available in structured medical record data. Ascertaining these harder-to-reach data elements is now critical for the accurate phenotyping of complex traits, detection of adverse outcomes, efficacy of off-label drug use, and longitudinal patient surveillance. Clinical notes often contain the most detailed and relevant digital information about individual patients, the nuances of their diseases, the treatment strategies selected by physicians, and the resulting outcomes. However, notes remain largely unused for research because they contain Protected Health Information (PHI), which is synonymous with individually identifying data. Previous clinical note de-identification approaches have been rigid and still too inaccurate to see any substantial real-world use, primarily because they have been trained with too small medical text corpora. To build a new de-identification tool, we created the largest manually annotated clinical note corpus for PHI and develop a customizable open-source de-identification software called Philter ("Protected Health Information filter"). Here we describe the design and evaluation of Philter, and show how it offers substantial real-world improvements over prior methods

eScholarship - University of California